Identity-sensitive Word Embedding through Heterogeneous Networks

نویسندگان

  • Jian Tang
  • Meng Qu
  • Qiaozhu Mei
چکیده

Most existing word embedding approaches do not distinguish the same words in different contexts, therefore ignoring their contextual meanings. As a result, the learned embeddings of these words are usually a mixture of multiple meanings. In this paper, we acknowledge multiple identities of the same word in different contexts and learn the identitysensitive word embeddings. Based on an identity-labeled text corpora, a heterogeneous network of words and word identities is constructed to model different-levels of word co-occurrences. The heterogeneous network is further embedded into a low-dimensional space through a principled network embedding approach, through which we are able to obtain the embeddings of words and the embeddings of word identities. We study three different types of word identities including topics, sentiments and categories. Experimental results on real-world data sets show that the identitysensitive word embeddings learned by our approach indeed capture different meanings of words and outperforms competitive methods on tasks including text classification and word similarity computation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel key management scheme for heterogeneous sensor networks based on the position of nodes

Wireless sensor networks (WSNs) have many applications in the areas of commercial, military and environmental requirements. Regarding the deployment of low cost sensor nodes with restricted energy resources, these networks face a lot of security challenges. A basic approach for preparing a secure wireless communication in WSNs, is to propose an efficient cryptographic key management protocol be...

متن کامل

Embedding Identity and Interest for Social Networks

Network embedding fills the gap of applying tuple-based data mining models to networked datasets through learning latent representations or embeddings. However, it may not be likely to associate latent embeddings with physical meanings just as the name, latent embedding, literally suggests. Hence, models built on embeddings may not be interpretable. In this paper, we thus propose to learn ident...

متن کامل

The Geometry of Culture: Analyzing Meaning through Word Embeddings

We demonstrate the utility of a new methodological tool, neural-network word embedding models, for large-scale text analysis, revealing how these models produce richer insights into cultural associations and categories than possible with prior methods. Word embeddings represent semantic relations between words as geometric relationships between vectors in a high-dimensional space, operationaliz...

متن کامل

Semi-Supervised Learning with Multi-View Embedding: Theory and Application with Convolutional Neural Networks

This paper presents a theoretical analysis of multi-view embedding – feature embedding that can be learned from unlabeled data through the task of predicting one view from another. We prove its usefulness in supervised learning under certain conditions. The result explains the effectiveness of some existing methods such as word embedding. Based on this theory, we propose a new semi-supervised l...

متن کامل

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1611.09878  شماره 

صفحات  -

تاریخ انتشار 2016